Generic Processes to Use S3L

We will show you two generic processes to start using S3L, which includes two parts:

  1. Experiment Framework
  2. Call Algorithms Directly

Experiment Framework

We provide built-in experiment process for different semi-supervised settings with different input data such as inductive/transductive, WithGraph/WithoutGraph, givenDataSplit/randomlySplit and so on. The experiment class implements the following process: load data, data split, hyper-parameters search and evaluate the selected model in testing data. In order to accelerate the experiments, we also include multi-process with joblib. The experiment framework allow you to evaluate supervised/semi-supervised learning algorithms in less than ten statements. Take an example,

import sys
import os

from s3l.Experiments import SslExperimentsWithGraph
from s3l.classification.LPA import LPA


if __name__ == '__main__':
    configs = [
        ('LPA', LPA(), {
            'kernel': ['rbf'],
            'n_neighbors':[3,5,7]
        })
    ]

    datasets = [
        ('ionosphere', None, None, None, None)
    ]
    # (name, feature_file, label_file, split_path, graph_file)

    experiments = SslExperimentsWithGraph(n_jobs=1)

    experiments.append_configs(configs)
    experiments.append_datasets(datasets)
    experiments.set_metric(performance_metric='accuracy_score')

    results = experiments.experiments_on_datasets(
        unlabel_ratio=0.75, test_ratio=0.2, number_init=4)

    # do something with results #

The above codes evaluates Label Propagation algorithm on the built-in dataset ionosphere. The best model is searched with rbf kernel and n_neighbors is in the range of [3, 5, 7]. Finally, the accuracy_score is reported in the local variable result.

Call Algorithms Directly

The built-in algorithms can be called directly as in sklearn package. The algorithms we have implemented are listed here. As long as reading the examples of certain algorithm in its module page, you can easily try out semi-supervised algorithm as you like. For example,

import sys
import os
import numpy as np
from s3l.classification.TSVM import TSVM
from s3l.metrics.performance import accuracy_score
from s3l.datasets import base, data_manipulate


if __name__ == '__main__':
    datasets = [
        ('house', None, None),
    ]
    for name, feature_file, label_file in datasets:
        # load dataset
        X, y = base.load_dataset(name, feature_file, label_file)

        # split
        _, _, labeled_idxs, unlabeled_idxs = \
            data_manipulate.inductive_split(X=X, y=y, test_ratio=0.,
                            initial_label_rate=1 - unlabel_ratio,
                            split_count=1, all_class=True)

        labeled_idx = labeled_idxs[0]
        unlabeled_idx = unlabeled_idxs[0]

        tsvm = TSVM()
        tsvm.fit(X, y, labeled_idx)
        pred = lead.predict(X[unlabeled_idx])
        print("Accuracy_score: {}".format(
                    accuracy_score(y[unlabeled_idx], pred)))

The above code runs TSVM (Transductive Support Vector Machine) with default hyper-parameter settings given feature X, label y and indexes of labeled data``labeled_idx``. Then, the prediction is evaluated with accuracy score on unlabeled data.